Stereotypical gender actions can be extracted from web text

نویسندگان

  • Amac Herdagdelen
  • Marco Baroni
چکیده

We extracted gender-speci c actions from text corpora and Twitter, and compared them to stereotypical expectations of people. We used Open Mind Common Sense (OMCS), a commonsense knowledge repository, to focus on actions that are pertinent to common sense and daily life of humans. We use the gender information of Twitter users and Webcorpus-based pronoun/name gender heuristics to compute the gender bias of the actions. With high recall, we obtained a Spearman correlation of 0.47 between corpus-based predictions and a human gold standard, and an area under the ROC curve of 0.76 when predicting the polarity of the gold standard. We conclude that it is feasible to use natural text (and a Twitter-derived corpus in particular) in order to augment commonsense repositories with the stereotypical gender expectations of actions. We also present a dataset of 441 commonsense actions with human judges' ratings on whether the action is typically/slightly masculine/feminine (or neutral), and another larger dataset of 21,442 actions automatically rated by the methods we investigate in this study. ∗This is a preprint of an article accepted for publication in Journal of the American Society for Information Science and Technology copyright c ©2011 (American Society for Information Science and Technology). †Current a liation: New England Complex Systems Institute, Cambridge, USA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Questions to Needs: Extracting Actionable Need Knowledge from the Web

The Web has become a source of information and knowledge accessed through search engines and various mining tools. In addition, it has become a platform on which not only software but also human users participate in and interact for various tasks. In an effort to mine actionable knowledge from the Web associated with various objects, we devised a text mining method specifically for extracting t...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

Prediction of user's trustworthiness in web-based social networks via text mining

In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 62  شماره 

صفحات  -

تاریخ انتشار 2011